Exploring the Effect of Bag-of-words and Bag-of-bigram Features on Turkish Word Sense Disambiguation
نویسندگان
چکیده
Feature selection in Word Sense Disambiguation (WSD) is as important as the selection of algorithm to remove sense ambiguity. Bag-of-word (BoW) features comprise the information of neighbors around the ambiguous target word without considering any relation between words. In this study, we investigate the effect of BoW features and Bag-of-bigrams (BoB) on Turkish WSD and compare the results with the collocational features. The results suggest that BoW features yield better accuracy for all the cases. According to the comparison results, collocational features are more effective than both BoW and the BoB features on disambiguation of word senses.
منابع مشابه
Exploring feature sets for Turkish word sense disambiguation
This paper presents an exploration and evaluation of a diverse set of features that influence word-sense disambiguation (WSD) performance. WSD has the potential to improve many natural language processing (NLP) tasks as being one of the most crucial steps in the area. It is known that exploiting effective features and removing redundant ones help improving the results. There are two groups of f...
متن کاملWord Sense Disambiguation Using WordNet Relations
In this paper, the “Weighted Overlapping” Disambiguation method is presented and evaluated. This method extends the Lesk’s approach to disambiguate a specific word appearing in a context (usually a sentence). Sense’s definitions of the specific word, “Synset” definitions, the “Hypernymy” relation, and definitions of the context features (words in the same sentence) are retrieved from the WordNe...
متن کاملThe WSD Development Environment
In this paper we present the Word Sense Disambiguation Development Environment (WSDDE), a platform for testing various Word Sense Disambiguation (WSD) technologies, as well as the results of first experiments in applying the platform to WSD in Polish. The current development version of the environment facilitates the construction and evaluation of WSD methods in the supervised Machine Learning ...
متن کاملAn Approach to Word Sense Disambiguation Combining Modified Lesk and Bag-of-words
In this paper, we are going to propose a technique to find meaning of words using Word Sense Disambiguation using supervised and unsupervised learning. This limitation of information is main flaw of the supervised approach. Our proposed approach focuses to overcome the limitation using learning set which is enriched in dynamic way maintaining new data. We introduce a mixed methodology having “M...
متن کاملComplex, Corpus-Driven, Syntactic Features for Word Sense Disambiguation
Although syntactic features offer more specific information about the context surrounding a target word in a Word Sense Disambiguation (WSD) task, in general, they have not distinguished themselves much above positional features such as bag-of-words. In this paper we offer two methods for increasing the recall rate when using syntactic features on the WSD task by: 1) using an algorithm for disc...
متن کامل